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Coronaviruses have evolved diverse mechanisms to recognize 
different receptors for their cross-species transmission and host- 
range expansion. Mouse hepatitis coronavirus (MHV) uses the 
_N-terminal domain (NTD) of its spike protein as its receptor- 
binding domain. Here we present the crystal structure of MHV 
NTD complexed with its receptor murine carcinoembryonic antigen- 
related cell adhesion molecule 1a (mCEACAM1a). Unexpectedly, 
MHV NTD contains a core structure that has the same B-sandwich 
fold as human galectins (S-lectins) and additional structural motifs 
that bind to the N-terminal Ig-like domain of mCEACAM 1a. Despite 
its galectin fold, MHV NTD does not bind sugars, but instead binds 
mCEACAM 1a through exclusive protein-protein interactions. Crit- 
ical contacts at the interface have been confirmed by mutagenesis, 
providing a structural basis for viral and host specificities of coro- 
navirus/CEACAM1 interactions. Sugar-binding assays reveal that 
galectin-like NTDs of some coronaviruses such as human coronavi- 
rus OC43 and bovine coronavirus bind sugars. Structural analysis 
and mutagenesis localize the sugar-binding site in coronavirus NTDs 
to be above the f-sandwich core. We propose that coronavirus 
NTDs originated from a host galectin and retained sugar-binding 
functions in some contemporary coronaviruses, but evolved new 
structural features in MHV for mCEACAM 1a binding. 


coronavirus evolution | galectin-like N-terminal domain of coronavirus spike 
proteins | carcinoembryonic antigen-related cell adhesion molecule 1 
binding by coronaviruses | sugar binding by coronaviruses 


Cc use a variety of cellular receptors and core- 
ceptors, including proteins and sugars. The diverse use of 
receptors has allowed coronaviruses to infect a wide range of 
mammalian and avian species and cause respiratory, enteric, sys- 
temic, and neurological diseases. How coronaviruses have evolved 
to do so has been a major puzzle in virology. To solve this puzzle, 
we have investigated the structural basis for the complex receptor- 
recognition mechanisms of coronaviruses. 

The Coronaviridae family of large, enveloped, positive-stranded 
RNA viruses consists of at least three major genera or groups (Table 
S1). Aminopeptidase-N (APN) is the receptor for porcine trans- 
missible gastroenteritis virus (TGEV), porcine respiratory corona- 
virus (PRCoV), and human coronavirus 229E (HCoV-229E) from 
group 1 (1-3). Carcinoembryonic antigen-related cell adhesion 
molecule 1 (CEACAM1), a member of the carcinoembryonic anti- 
gen family in the Ig superfamily, is the receptor for mouse hepatitis 
coronavirus (MHV) from group 2 (subgroup 2a) (4, 5). Angiotensin- 
converting enzyme 2 (ACE2) is the receptor for human coronavirus 
NL63 (HCoV-NL63) from group 1 and human severe acute re- 
spiratory syndrome coronavirus (SARS-CoV) from group 2 (sub- 
group 2b) (6, 7). Sugars serve as receptors or coreceptors for TGEV 
from group 1, bovine coronavirus (BCoV) and human coronavirus 
OC43 (HCoV-OC43) from group 2a, and avian infectious bronchitis 
virus (IBV) from group 3 (8-14). Receptor is unknown for some 
coronaviruses such as group 2a human coronavirus HKU1 (HCoV- 
HKU1). The diversity in receptor use is a distinctive feature of the 
Coronaviridae family and a few other virus families such as retro- 
viruses and paramyxoviruses (15, 16). 

The characteristic large spikes on coronavirus envelopes are 
composed of trimers of the spike protein. The spike protein 
mediates viral entry into host cells by functioning as a class I viral 


10696-10701 | PNAS | June 28,2011 | vol. 108 | no. 26 


fusion protein (17). During maturation, the spike protein is often 
cleaved into a receptor-binding subunit S1 and a membrane- 
fusion subunit S2 that associate together through noncovalent 
interactions (Fig. 14). S1 sequences are relatively well conserved 
within each coronavirus group, but differ markedly between dif- 
ferent groups. S1 contains two independent domains, N-terminal 
domain (NTD) and C domain, that can both serve as viral receptor- 
binding domains (RBDs) (Table $1). C domain binds to APN or 
ACE2 in coronaviruses that use them as receptors (3, 18-24), 
whereas NTD binds to CEACAM1 in MHV or sugar in TGEV (9, 
25). The sugar-binding domain has not yet been identified in the 
spike proteins of HCoV-OC43, BCoV, or IBV. The only atomic 
structures available for coronavirus S1 are C domains of HCoV- 
NL63 and SARS-CoV, each complexed with their common re- 
ceptor ACE2 (23, 24). Despite marked differences in their struc- 
tures, the two C domains bind to overlapping regions on ACE2 
(23, 24). Structural information has been lacking for any corona- 
virus S1 NTD. 

MHYV, the prototypic and extensively studied coronavirus, causes 
a variety of murine diseases. Strain AS59 (MHV-AS9) is primarily 
hepatotropic, whereas strain JHM (MHV-JHM) is neurotropic. 
Murine CEACAM1 (mCEACAM1), whose primary physiological 
functions are to mediate cell adhesion and signaling, is the prin- 
cipal receptor for all MHV strains (26). mCEACAM1a is broadly 
expressed in epithelial cells, endothelial cells, and microphages, but 
its expression level is low in the central nervous system and is re- 
stricted to endothelial and microgolial cells (27). mCEACAM1 
is encoded by two alleles to produce mCEACAM1Ia and -1b; 
mCEACAMIa is a much more efficient MHV receptor than 
mCEACAM |b (28, 29). mCEACAM1a contains either two [D1 and 
D4] or four [D1—D4] Ig-like domains in tandem, a result of alter- 
native mRNA splicing (26). The crystal structure of mCEACAM1a 
[1,4] shows that a CC’ loop (loop connecting B-strands C and C’) in 
the V-set Ig-like domain D1 encompasses key MHV-binding resi- 
dues including Ile41 (30). Curiously, although mammalian CEA- 
CAM 1 proteins are significantly conserved (Tables S2 and S3), only 
murine CEACAM 1a can serve as an efficient MHV receptor (31). 
Also, although group 2a coronavirus spike proteins are significantly 
conserved (Tables S2 and S3), only MHV spike protein interacts 
with mCEACAM 1a (31). The molecular determinants for the viral 
and host specificities of coronavirus/;CEACAM1 interactions re- 
main elusive. 

Despite the lack of sequence homology between coronavirus 
spike proteins and any known sugar-binding proteins (lectins), 
sugar moieties on host cell membranes such as glycoproteins, gly- 
colipids, and glycosaminoglycans play important roles in host cell 
infections by many coronaviruses (8). HCoV-OC43 and BCoV 
spike proteins recognize cell-surface components containing N- 
acetyl-9-O-acetylneuraminic acid (Neu5,9Ac2) (13, 14). These 
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viruses also contain a hemagglutinin-esterase (HE) that function’ 
as a receptor-destroying enzyme and aids viral detachment from 
sugars on infected cells (32). MHV spike protein does not bind 
sugars (33), and the HE genes of many MHV strains are present but 
not expressed (34). TGEV spike protein recognizes N-glycolylneur- 
aminic acid (Neu5Gc) and N-acetylneuraminic acid (Neu5Ac), and 
such sugar-binding activities are required for the enteric tropism 
of TGEV (9). PRCoV spike protein, an NTD-deletion mutant of 
TGEV spike protein, fails to bind sugars, and hence PRCoV has 
respiratory tropism only (3, 10). IBV spike protein recognizes 
NeuSAc (11, 12). Overall, many coronavirus spike proteins can 
function as viral lectins, but the molecular nature of their lectin 
activities is a mystery. 

How did coronavirus spike proteins originate and evolve, and 
how did the evolutionary changes in their spike proteins allow 
coronaviruses to explore novel cellular receptors and expand their 
host ranges? To address these questions, we have determined the 
crystal structure of MHV-A59 NTD complexed with mCEA- 
CAMIa{1, 4]. The structure has elucidated the receptor recogni- 
tion mechanism of MHV and identified the determinants of the 


viral and host specificities of coronavirus/;CEACAM1 interactions. | 


Furthermore, the NTD structure has unexpectedly revealed the 
structural basis for the lectin activities of coronavirus spike proteins 
and provided structural insights into the origin and evolution of 
coronavirus spike proteins. 


Results and Discussion 


Structure Determination. To prepare an MHV-A59 NTD frag- 
ment suitable for crystallization, we designed a series of C- 
terminal truncation constructs of MHV-A59 NTD based on its 
secondary structure predictions. One NTD fragment containing 
residues 1-296 was well expressed in insect cells and stable in so- 
lution. It bound to mCEACAM 1a[1,4] to form a 1:1 heterodimeric 
complex with Kg of 21.4 nM (Fig. 1B). We crystallized this complex 
in space group P6,22, a = 76.4 A, b = 76.4 A, andec = 942.1 A 
(Table S4), with two complexes per asymmetric unit (Fig. $1). The 
structure was determined by single-wavelength anomalous diffrac- 
tion (SAD) phases using selenomethionine-labeled mCEACAM La. 
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Fig. 1. Structure of MHV-NTD/mCEACAM 1a complex. (A) 
Domain structure of MHV spike protein. NTD: N-terminal 
domain; RBD: receptor-binding domain; HR-N: heptad- 
repeat N; HR-C: heptad-repeat C; TM: transmembrane an- 
chor; IC: intracellular tail. The signal peptide corresponds to 
residues 1-14 and is cleaved during molecular maturation 
(45). Structures and functions of gray areas have not been 
clearly defined. (8) Kinetics and binding affinity of NTD and 
a mCEACAM 1a. (C) Structure of NTD/mCEACAM 1a complex. 
Two B-sheets of the NTD core are in green and magenta, 
respectively; receptor-binding motifs (RBMs) are in red; 
other parts of the NTD are in cyan; mCEACAM 1a is in yel- 
low; and virus-binding motifs (VBMs) are in blue. N*: 
N terminus; C*: C terminus. (D) Sequence and secondary 
structures of NTD. B-Strands are shown as arrows, and the 
disordered region as a dashed line. 
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The phases were subsequently improved by an averaging method 
(35). We refined the structure at 3.1A resolution (Table $4). The 
final model contains residues 15-268 of NTD (except for a disordered 
loop from residues 40-64) and residues 1-202 of mCEACAMI1a. 
The model also contains glycans N-linked to viral residue 192 and to 
receptor residues 37, 55, and 70. 


Structure of MHV-NTD/mCEACAM1a Complex. The core of MHV 
NTD is a 13-stranded B-sandwich with two antiparallel B-sheets 
stacked against each other through hydrophobic interactions (Fig. 1 
C and D and Fig. 24). Surprisingly, this B-sandwich core of MHV 
NTD has a galectin fold, which will be discussed in detail later. 
Outside the core structure, MHV NTD contains several peripheral 
structural elements (Fig. 1C and Fig. 24). Three loops from the 
“upper” B-sheet of the core converge with the N-terminal segment to 
form a distinct, receptor-contacting substructure; these four struc- 
tural elements are termed receptor-binding motifs (RBMs). Three 
disulfide bonds, connecting cysteines 21-158, 165-246, and 153-187, 
reinforce MHV NTD (Fig. 28). 

MHV NTD binds to domain D1 of mCEACAM 1a (Fig. 1C and 
Fig. 2A). There is no significant structural change in mCEACAMI1a 
before and after MHV binding (Fig. $2). The four RBMs of MHV 
NTD contact two virus-binding motifs (VBMs) on the CC’C”’ face 
of mCEACAM 1a (Fig. 1C and Fig. 2A). A total of 14 residues in 
NTD interact with a total of 17 residues in mCEACAMIa (Fig. 3 
A-C). The binding buries 1,500 A? at the interface (Fig. 2B). This 
interface is intermediate between the SARS-CoV/ACE2 in- 
terface (1,700 A’) and the HCoV-NL63/ACE2 interface (1,300 
A’), but all of the three viral receptor-binding domains bind to 
their respective protein receptors with similar affinities (23). 
Notably, none of the observed or predicted glycans is involved in 
MHV-NTD/mCEACAM 1a interactions (Fig. 2B), and therefore 
MHV-NTD/mCEACAM 1a binding depends exclusively on pro- 
tein—protein interactions. 


Detailed MHV-NTD/mCEACAM1a Interactions. The interface between 


MHV NTD and mCEACAMIa is dominated by hydrophobic 
interactions with scattered polar interactions. Two hydrophobic 
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Fig. 2. Structural details of MHV-NTD/mCEACAM 1a interface. (A) Another 
view of the MHV-NTD/mCEACAM 1a structure, which is derived by rotating 
the one in Fig. 1€ 90° clockwise along a vertical axis. Virus-binding motif 1 
(VBM1) on MHV NTD includes strands fC, BC’, and BC’ and loops CC’, C’C’”, 
and C’'D. VBM2 on MHV NTD corresponds to loop FG. (8) Distribution of 
glycosylation sites and disulfide bonds. Glycans and glycosylated asparagines 
are in magenta, and cysteines are in yellow. The orientation of the structure 
is the same as in Fig. 1C. (C) A hydrophobic patch at the interface that is 
important for MHV-NTD/mCEACAM1a binding. MHV residues are in ma- 
genta, and mCEACAM1a residues are in green. The orientation of the 
structure is derived by rotating the structure in Fig. 1C 180° along a vertical 
axis. (D) Another hydrophobic patch at the interface that is important for 
MHV-NTD/mCEACAM 1a binding. The orientation of the structure is slightly 
adjusted from the one in Fig. 1C. 


patches stand out. One centers on Ile41 from the CC’ loop 
of mCEACAM1Ia. Ile41 is surrounded by the hydrophobic side 
chains of MHV Tyri5, Leu89, and Leu160 and the aliphatic side 
chains of MHV Gln159 and Arg20 (Fig. 2C). Additionally, MHV 
Arg20 forms a bifurcated hydrogen bond with the main chain 
carbonyl group of receptor Thr39 and another hydrogen bond with 
MHV Gin159. Receptor Arg96 forms a bifurcated salt bridge with 
receptor Asp89, while stacking with MHV Arg20. The second 
critical hydrophobic patch involves multiple hydrophobic residues 
from both MHV and mCEACAMl1a that include MHV residues 
He22 and Tyr162 and receptor residues Val49, Met54, and Phe56 
(Fig. 2D). Additionally, MHV Asn26 forms a hydrogen bond with 
the main chain amide group of receptor Thr57. In protein-protein 
interactions, hydrophobic interactions contribute more to binding 
energy, whereas hydrophilic interactions contribute more to bind- 
ing specificity. The above key hydrogen bonds between NTD side 
chains and the receptor main chain help bring the adjacent hy- 
drophobic patches into place. These structural analyses suggest 
that the hydrophobic patches and additional polar mteractions 
provide significant binding energy and specificity to MHV-NTD/ 
mCEACAM 1a binding interactions. 

The importance of contact residues at the NTD/mCEACAM1Ia 
interface has been confirmed by mutagenesis data. Here we com- 
pared the efficiency of mCEACAM la-dependent cell entry by len- 
tiviruses pseudotyped with wild-type or mutant MHV-AS9 spike 
protein (Fig. 3D). Our data, together with published data (Fig. 3£), 
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showed that single mCEACAM|Ia substitution I41G and single 
NTD substitutions I22A, Y162A, Y162Q, Y162H, but not Y162F, 
abrogated viral infectivity (28, 36), underscoring the significance 
of the two hydrophobic patches. Furthermore, single NTD sub- 
stitutions R20A, R20K, and N26A significantly decreased viral in- 
fectivity, confirming the significance of the hydrogen bonds between 
Arg20 and mCEACAM 1a and between Asn26 and mCEACAM La. 
A naturally occurring Q159L mutation in MHV NTD caused small 
viral plaques (Fig. 3E) (37), suggesting that the hydrogen bond be- 
tween Gln159 and Arg20 helps position Arg20 to interact with 
mCEACAM 1a. Therefore, through evolution MHV-A59 appears to 
have optimized many of its interactions with mCEACAM |a, and 
thus substitutions in MHV NTD that disrupt these specific molec- 
ular interactions weaken or abrogate viral infectivity. 

Our study provides the structural basis for the viral and host 
specificities of coronavirus/CEACAM 1 interactions. On the basis 
of structural analyses and mutagenesis data, we determined that 
MHV NTD contains Arg20, [le22, Asn26, and Tyr162, all of which 
form energetically favorable interactions with mCEACAM 1a, whereas 
other group 2a coronavirus NTDs contain residues at the corre- 
sponding positions that are expected to disrupt critical hydrophobic 
or polar interactions with mCEACAM 1a (Fig. 3B and Fig. $3). On 
the other hand, on the basis of structural analyses, we found that 
mCEACAM 1a contains Ile41, Val49, Met54, and PheS6, all of 
which form energetically favorable interactions with MHV, 
whereas MCEACAM 1b, bovine CEACAM 1a and -1b, and human 
CEACAM 1] contain residues at the corresponding positions that 
likely disrupt these favorable interactions with MHV (Fig. 3C and 
Fig. S3). For example, hydrophobic residues [le41 and Phe56 in 
mCEACAM Ia become hydrophilic residues Thr41 and Thr56 in 
mCEACAM |b (Fig. 3C and Fig. $3). These results reveal the 
mechanisms whereby MHV uses only mCEACAMlIa, and not 
mCEACAMI1b or CEACAMI from cattle or humans, as its re- 
ceptor, and whereby other group 2a coronaviruses cannot use 
mCEACAM fa as a receptor. 


Sugar Binding by Coronavirus NTDs. The B-sandwich core of MHV 
NTD shares the same 11-stranded fold as human galectins (S- 
lectins) and rotavirus VP4 (viral lectin) (38, 39), augmented by 
two additional B-strands in the “lower” B-sheet (Fig. 4 and Fig. S4). 
MHV NTD and human galectin-3 have a Dali Z-score of 7.8 and 
an rmsd value of 2.9 A over 137 matching Ca atoms (40). Impor- 
tantly, the topologies of their 6-sandwich cores are identical (Fig. 
54). This unexpected structural homology between MHV NTD and 
human galectins suggests that coronavirus NTDs may function as 
viral lectins. To test this possibility, we designed NTD constructs of 
other group 2 coronaviruses that correspond to the crystallized 
MHV-NTD fragment based on the sequence alignment of these 
spike proteins. We expressed and purified each of these corona- 
virus NTDs and detected their binding interactions with mucin, 
a mixture of highly glycosylated proteins containing all of the sugar 
moieties (Neu5,9Ac2, Neu5Gc, and NeuS5Ac) recognized by the 
coronavirus spike proteins. Results showed that NTDs of HCoV- 
OC43 and BCoV bound sugars, whereas NTDs of MHV-AS59, 
HCoV-HKU 1, and SARS-CoV did not (Fig. 5). Removal of neu- 
raminic acids from mucin by neuraminidase treatment prevented 
binding of HCoV-OC43 or BCoV NTD. 

Why do the NTDs of HCoV-OC43 and BCoV, but not that of 
MHYV, bind sugars, and where is the sugar-binding site located in 
coronavirus NTDs? In human galectin-3 that binds galactose, the 
sugar-binding site (site A) is located above the B-sandwich core and 
involves the 10-11 loop (loop connecting B10 and B11) (Fig. 4 B 
and £) (39). In rotavirus VP4, which binds sialic acids, site A is 
blocked by a two-stranded fB-sheet; instead, the sugar-binding site 
(site B) is located in a groove between the two B-sheets of the 
B-sandwich core (Fig. 4 C and F) (38). In MHV NID, site B is 
blocked due to the narrowed groove between the two f-sheets of 
the B-sandwich core, whereas site A is open and available (Fig. 4A 
and D). However, compared with human galectin-3, MHV NTD 
has a markedly shortened 10-11 loop that may be responsible for 
its lack of lectin activity. HCoV-OC43 and BCoV NTDs likely 
share the same galectin fold as MHV NTD due to their high se- 
quence similarities (Fig. $1), but they both contain longer 10-11 
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loops than MHV NTD (Fig. 3B and Fig. $3) and thus may use site 
A for sugar binding. To test this hypothesis, we modified the 10-11 
loops in both BCoV and HCoV-0C43 NTDs, using MHV NTD as 
a reference (Fig. 3B and Fig. S3). For both BCoV and HCoV- 
OC43 NTDs, the mutant and wild-type proteins were equally well 
expressed and stable in solution, but the mutant proteins (OC43* 
and BCoV*) lacked sugar-binding activities (Fig. 5). These obser- 
vations confirm that the 10-11 loops are critical for sugar binding in 
both BCoV and HCoV-0C43 NTDs. A more refined description of 
the sugar-binding site in BCoV and HCoV-OC43 NTDs awaits 
future structural and biochemical studies. 


Coronavirus Receptor Use and Evolution. To date, three crystal 


structures are available for RBDs of coronavirus Sl: group 2a 
MHV NTD, group 2b SARS-CoV C domain (24), and group 1 


MHV NTD 
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pseudotyped with wild-type or mutant MHV-A59 spike 
proteins. SEs are shown. (£) Published mutagenesis data 
on MHV NTD (28, 36, 37). 


HCoV-NL63 C domain (23) (Fig. 6). Because of the significant 
sequence similarities of the S1 subunits of the spike proteins within 
each coronavirus group, the six-stranded B-sandwich core structure 
of the HCoV-NL63 C domain likely exists in other group 1 coro- 
naviruses (23), and the 5-stranded B-sheet core structure of the 
SARS-CoV C domain likely exists in other group 2 coronaviruses 
(24). Similarly, the galectin-like NTD of MHV likely exists in other 
group 2 coronaviruses. The folds of group 1 and group 3 corona- 
virus NTDs are less clear. However, because both TGEV NTD and 
IBV S1 have lectin activities, the galectin-fold core structure of 
group 2a coronavirus NTDs may also be found in both group 1 and 
group 3 coronaviruses in similar or variant forms. The present 
study advances our understanding of the structures and functions 
of coronavirus spike proteins and the complex receptor-recognition 
mechanisms of coronaviruses. 


Rotavirus VP4 


Fig. 4. Structural comparisons of MHV NTD, human galectins, 
and rotavirus VP4. (A) MHV NTD. The orientation of the 
structure is the same as in Fig. 1C. (B) Human galectin-3 
{Protein Data Bank (PDB) 1A3K]. The B-sandwich core is la- 
beled and colored the same as in MHV NTD. Bound galactose 
is in yellow. (C) Rotavirus VP4 (PDB 1KQR). Bound sialic acid is 
in yellow. (D) Another view of MHV $1 NTD, which is derived 
by rotating the structure in A counterclockwise along a verti- 
cal axis. Arrow indicates loop 10-11. (£) Another view of hu- 
man galectin-3. Site A indicates its galactose-binding site. (F)} 
Another view of rotavirus VP4. Site B indicates its sialic-acid- 
binding site. 
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Fig. 5. Sugar-binding assays of group 2 coronavirus NTDs. (A) Dot-blot 
overlay assay. Measured were the binding interactions between coronavirus 
NTDs and sugar moieties on mucin-spotted nitrocellulose membranes. The 
membranes were either mock-treated or treated with neuraminidase (Nase) 
beforehand. Sugar-binding NTDs were detected using antibodies against 
their C-terminal His tags. (B) ELISA in which mucin-coated plates were used 
instead of nitrocellulose membranes. Sugar-binding NTDs were detected 
using ELISA substrates, and absorbance of the resulting yellow color was 
read at 450 nm. SEs are shown. BCoV* and OC43*: BCoV and HCoV-0C43 
NTDs whose 10-11 loops have been replaced by that of MHV NTD. 


How did coronavirus spike NTDs originate and evolve? We 
propose that an ancestral coronavirus acquired a galectin-like do- 
main from its host. Subsequently, an ancestral group 2a coronavi- 
rus incorporated a HE gene into its genome to aid viral detachment 
from sugars on infected cells. Later, the galectin-like NTD of MHV 
evolved additional novel structural elements that allowed it to bind 
mCEACAM|1a. Using a protein receptor instead of sugar receptors 
greatly enhanced the attachment affinity between MHV and mu- 
rine cells, making sugar-binding functions dispensable. Accord- 
ingly, MHV-A59 underwent changes in the sugar-binding site of its 
NTD, lost its sugar-binding activity, and stopped expressing its HE 
gene. In contrast, the galectin-like NTDs of some contemporary 
coronaviruses such as HCoV-OC43, BCoV, and TGEV retain the 
lectin activity, although their sugar specificities have diverged in 
three coronavirus groups and differ from those of contemporary 
human galectins. Some TGEV strains deleted their NTD to be- 
come PRCoV after their C domain acquired APN-binding affinity. 
In addition to coronaviruses, there is evidence that paramyxoviruses 
may also have acquired their RBD from a host (although with a f- 
propeller fold) and used it to bind sugar or protein receptors (41, 42). 
It seems that viruses use a common evolutionary strategy by ac- 
quiring host proteins and evolving them into viral RBDs with different 
receptor specificity. This strategy allows viruses to explore novel cellular 
receptors and expand their host ranges. The current study provides 
critical structural information that illustrates how coronaviruses have 
successfully used this strategy. 


Materials and Methods 


Protein Purification and Crystallization. Both MHV-A59 $1 NTD (residues 1-296) 
and mCEACAM1a[1,4] (residues 1-202) were expressed in Sf9 insect cells using 
the bac-to-bac system (Invitrogen) and then purified as previously described 
(24). Briefly, the proteins containing C-terminal His tags were harvested from 
Sf9 cell supernatants, loaded onto a nickel-nitrilotriacetic acid (Ni-NTA) col- 
umn, eluted from the Ni-NTA column with imidazole, and further purified 
by gel filtration chromatography on Superdex 200 (GE Healthcare). To purify 
the NTD/mCEACAM 1a complex, NTD was incubated with excess mCEACAM 1a 
before the complex was purified by gel filtration chromatography and con- 
centrated to 10 mg/mL. Crystals of the NTD/mCEACAM1a complex were 
grown in sitting drops at 4 °C over wells containing 10% PEG6000 and 0.2 M 
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Fig. 6. Structures, functions, and evolution of $1 subunits of coronavirus 
spike proteins. The three known crystal structures are indicated by “Struc- 
ture.” Among these structures, MHV NTD has a 13-stranded galectin-like 
B-sandwich foid, HCoV-NL63 C domain has a six-stranded f-sandwich fold, 
and the SARS-CoV C domain has a five-stranded (-sheet fold. 


CaCl2, Crystals were harvested in 2 wk; stabilized in 10% PEG6000, 0.2 M 
CaCl2, and 30% ethylene glycol; and flash-frozen in liquid nitrogen. 

Selenomethionine-labeled mCEACAM1a was expressed in Sf9 cells as 
previously described (43). Briefly, 24 h postinfection, cells were transferred 
into medium without methionine for methionine depletion. After 4 h, cells 
were transferred into medium without methionine but supplemented with 
50 mg/mL selenomethionine for selenomethionine labeling. After 36 h, cells 
were harvested and selenomethionine-labeled protein was purified using 
the same procedure as above. 


Structure Determination and Refinement. X-ray data were collected at Ad- 
vanced Photon Source Northeastern Collaborative Access Team beamlines. 
The crystal contains two complexes per asymmetric unit. From a selenome- 
thionine-labeled crystal, 12 selenomethionine sites in mCEACAM1a were 
identified. SAD phases were then calculated. The phases were subsequently 
improved by twofold noncrystallographic symmetry (NCS) averaging within 
the crystal and cross-crystal averaging with the mCEACAM 1a crystal (35). 
During both density modification and structure refinement, the twofold NCS 
restraint was applied to each of the three domains—-NTD and domains D1 
and D4 of mCEACAM1a—between the two complexes in each asymmetric 
unit. This is because domains D1 and D4 undergo a hinge movement relative 
to each other in the two complexes due to the flexibility of the domain linker. 
The structure was refined to a final Rice of 30.8% and Ryo. Of 24.8%. Data, 
phasing, and refinement statistics are shown in Table $4. Software used for data 
processing, structure determination, and refinement is also listed in Table $4. 


Kinetics and Binding Affinity of MHV S1 NTD and mCEACAM1a by Surface 
Plasmon Resonance Using Biacore. The kinetics and binding affinity of MHV S1 
NTD and mCEACAM 1a were measured by surface plasmon resonance using 
a Biacore 3000. The surface of a C5 sensor chip was first activated with N- 
hydroxysuccinimide, MHV $1 NTD was then injected and immobilized to the 
surface of the chip, and the remaining activated surface of the chip was 
blocked with ethanolamine. Soluble mCEACAM1a was introduced at a flow 
rate of 20 pL/min at different concentrations. Kinetic parameters were de- 
termined using BIA-EVALUATIONS software. 


mCEACAM1a-Dependent Cell Entry of Lentiviruses Pseudotyped with MHV-A59 
Spike Protein. Lentiviruses pseudotyped with MHV-A59 spike protein were 
produced as previously described (44). Briefly, plasmid encoding wild-type or 
mutant MHV-A59 spike protein was cotransfected into 293T cells with helper 
plasmid psPAX2 and reporter plasmid pLenti-GFP at molar ratio 1:1:1 using 
polyethyleneimine (Polysciences Inc.). Forty-eight hours posttransfection, the 
resulting pseudotyped viruses were harvested and inoculated onto the 293 
cells expressing mCEACAM 1a. After overnight, cells were washed twice with 
PBS and lifted by 100 yL of 0.25% trypsin and 0.38 mg/mL EDTA. After being 
washed twice with PBS, cells were fixed with 1% paraformaldehyde and an- 
alyzed for GFP expression by flow cytometry. All experiments were repeated 
at least three times. The expression levels of spike proteins in pseudotyped 
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viruses were measured by Western biotting using polyclonal goat antibody to 
MHV-A59 spike protein (Fig. $5). 


Sugar-Binding Assays of Coronavirus $1 NTDs. Sugar-binding assays of coro- 
navirus $1 NTDs were performed as previously described (13, 33). Briefly, for 
the dot-blot overlay assay, 10 pg of bovine submaxillary gland mucin (BSM) 
(Sigma-Aldrich) was spotted onto nitrocellulose membranes. The mem- 
branes were dried completely, blocked with BSA at 4 °C overnight, and ei- 
ther mock-treated or treated with 20 mU/mL Arthrobacter ureafaciens 
neuraminidase (Roche Applied Science) at 25 °C for 2 h. The membranes 
were then incubated with 1 pM coronavirus $1 NTDs containing a C-terminal 
His tag at 4 °C for 2 h, washed five times with PBS, incubated with anti-His 
antibody (Invitrogen) at 4 °C for 2 h, washed five times with PBS again, in- 
cubated with HRP-conjugated goat anti-mouse IgG antibody (1:5,000) at 4 °C 
for 2 h, and washed five times with PBS. Finally, the bound proteins were 
detected using a chemiluminescence reagent (ECL plus, GE Healthcare). For 
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the ELISA, BSM (60 pg/mL in PBS) was coated at 4 °C overnight in the wells of 
96-well Maxisorp plates (Nunc). The wells were treated in the same way as in 
the dot-blot overlay assay. Femto-ELISA-HRP substrates were added and 
incubated at room temperature. The reaction was stopped with 1 N HCl, and 
the absorbance of the resulting yellow color was read at 450 nm. 
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